On Optimal Probabilities in Stochastic Coordinate Descent Methods
نویسندگان
چکیده
We propose and analyze a new parallel coordinate descent method—‘NSync— in which at each iteration a random subset of coordinates is updated, in parallel, allowing for the subsets to be chosen non-uniformly. We derive convergence rates under a strong convexity assumption, and comment on how to assign probabilities to the sets to optimize the bound. The complexity and practical performance of the method can outperform its uniform variant by an order of magnitude. Surprisingly, the strategy of updating a single randomly selected coordinate per iteration—with optimal probabilities—may require less iterations, both in theory and practice, than the strategy of updating all coordinates at every iteration.
منابع مشابه
Adaptive Probabilities in Stochastic Optimization Algorithms
Stochastic optimization methods have been extensively studied in recent years. In some classification scenarios such as text document categorization, unbiased methods such as uniform sampling have negative effects on the convergence rate, because of the effects of the potential outlier data points on the estimator. Consequently, it would take more iterations to converge to the optimal value for...
متن کاملFaster Optimization through Adaptive Importance Sampling
The current state of the art stochastic optimization algorithms (SGD, SVRG, SCD, SDCA, etc.) are based on sampling one active datapoint uniformly at random in each iteration. Changing these probabilities to better reflect the importance of each datapoint is a natural and powerful idea. In this thesis we analyze Stochastic Coordinate Descent methods with fixed non-uniform and adaptive sampling. ...
متن کاملAdaptive Sampling Probabilities for Non-Smooth Optimization
Abstract Standard forms of coordinate and stochastic gradient methods do not adapt to structure in data; their good behavior under random sampling is predicated on uniformity in data. When gradients in certain blocks of features (for coordinate descent) or examples (for SGD) are larger than others, there is a natural structure that can be exploited for quicker convergence. Yet adaptive variants...
متن کاملTrading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We ...
متن کاملOptimal quantization methods and applications to numerical problems in finance
We review optimal quantization methods for numerically solving nonlinear problems in higher dimension associated with Markov processes. Quantization of a Markov process consists in a spatial discretization on finite grids optimally fitted to the dynamics of the process. Two quantization methods are proposed: the first one, called marginal quantization, relies on an optimal approximation of the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Optimization Letters
دوره 10 شماره
صفحات -
تاریخ انتشار 2016